Overview

Dataset Statistics

Number of Variables 15
Number of Rows 381109
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 142.5 MB
Average Row Size in Memory 392.0 B
Variable Types
  • Numerical: 8
  • Categorical: 7

Dataset Insights

id is uniformly distributed Uniform
age is skewed Skewed
region_code is skewed Skewed
policy_sales_channel is skewed Skewed
annual_premium is skewed Skewed
premium_per_vintage is skewed Skewed
premium_per_age is skewed Skewed
response has constant length 1 Constant Length
insurance_age has constant length 1 Constant Length

Variables

id

numerical

Approximate Distinct Count 381109
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 190555
Minimum 1
Maximum 381109
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • id is uniformly distributed

Quantile Statistics

Minimum 1
5-th Percentile 19232.1
Q1 95472
Median 190647
Q3 286115.5
95-th Percentile 362149.2
Maximum 381109
Range 381108
IQR 190643.5

Descriptive Statistics

Mean 190555
Standard Deviation 110016.8362
Variance 1.2104e+10
Sum 7.2622e+10
Skewness -2.8277e-19
Kurtosis -1.2
Coefficient of Variation 0.5773
  • id is not normally distributed (p-value 7.259388078140076e-05)

gender

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 25.4 MB

Length

Mean 4.9185
Standard Deviation 0.9967
Median 4
Minimum 4
Maximum 6

Sample

1st row Male
2nd row Female
3rd row Female
4th row Male
5th row Female

Letter

Count 1874476
Lowercase Letter 1493367
Space Separator 0
Uppercase Letter 381109
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Male, Female) take over 50.0%

age

numerical

Approximate Distinct Count 66
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 38.8226
Minimum 20
Maximum 85
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • age is skewed right (γ1 = 0.6725)

Quantile Statistics

Minimum 20
5-th Percentile 21
Q1 25
Median 36
Q3 49
95-th Percentile 69
Maximum 85
Range 65
IQR 24

Descriptive Statistics

Mean 38.8226
Standard Deviation 15.5116
Variance 240.6101
Sum 1.4796e+07
Skewness 0.6725
Kurtosis -0.5657
Coefficient of Variation 0.3996
  • age is not normally distributed (p-value 7.191893600196644e-12)

region_code

numerical

Approximate Distinct Count 53
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 26.3888
Minimum 0
Maximum 52
Zeros 2021
Zeros (%) 0.5%
Negatives 0
Negatives (%) 0.0%
  • region_code is skewed left (γ1 = -0.1153)

Quantile Statistics

Minimum 0
5-th Percentile 5
Q1 15
Median 28
Q3 35
95-th Percentile 47
Maximum 52
Range 52
IQR 20

Descriptive Statistics

Mean 26.3888
Standard Deviation 13.2299
Variance 175.0299
Sum 1.0057e+07
Skewness -0.1153
Kurtosis -0.8679
Coefficient of Variation 0.5013
  • region_code is not normally distributed (p-value 9.806845814393407e-22)

policy_sales_channel

numerical

Approximate Distinct Count 155
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 112.0343
Minimum 1
Maximum 163
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • policy_sales_channel is skewed left (γ1 = -0.9)

Quantile Statistics

Minimum 1
5-th Percentile 26
Q1 26
Median 133
Q3 152
95-th Percentile 160
Maximum 163
Range 162
IQR 126

Descriptive Statistics

Mean 112.0343
Standard Deviation 54.204
Variance 2938.073
Sum 4.2697e+07
Skewness -0.9
Kurtosis -0.9708
Coefficient of Variation 0.4838
  • policy_sales_channel is not normally distributed (p-value 1.854436617615345e-16)

driving_license

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 24.7 MB
  • The largest value (Yes) is over 468.35 times larger than the second largest value (No)

Length

Mean 2.9979
Standard Deviation 0.04611
Median 3
Minimum 2
Maximum 3

Sample

1st row Yes
2nd row Yes
3rd row Yes
4th row Yes
5th row Yes

Letter

Count 1142515
Lowercase Letter 761406
Space Separator 0
Uppercase Letter 381109
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Yes, No) take over 50.0%
  • The largest value (yes) is over 468.35 times larger than the second largest value (no)

vehicle_age

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 28.8 MB

Length

Mean 14.1025
Standard Deviation 1.9974
Median 16
Minimum 12
Maximum 16

Sample

1st row between_1_2_year
2nd row between_1_2_year
3rd row between_1_2_year
4th row between_1_2_year
5th row below_1_year

Letter

Count 3830613
Lowercase Letter 3830613
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 581425
  • The top 2 categories (between_1_2_year, below_1_year) take over 50.0%

vehicle_damage

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 24.5 MB

Length

Mean 2.5049
Standard Deviation 0.5
Median 3
Minimum 2
Maximum 3

Sample

1st row Yes
2nd row Yes
3rd row Yes
4th row Yes
5th row No

Letter

Count 954631
Lowercase Letter 573522
Space Separator 0
Uppercase Letter 381109
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Yes, No) take over 50.0%

previously_insured

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 24.5 MB

Length

Mean 2.4582
Standard Deviation 0.4983
Median 2
Minimum 2
Maximum 3

Sample

1st row No
2nd row No
3rd row No
4th row No
5th row Yes

Letter

Count 936846
Lowercase Letter 555737
Space Separator 0
Uppercase Letter 381109
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (No, Yes) take over 50.0%

annual_premium

numerical

Approximate Distinct Count 48838
Approximate Unique (%) 12.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 30564.3896
Minimum 2630
Maximum 540165
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • annual_premium is skewed right (γ1 = 1.7661)

Quantile Statistics

Minimum 2630
5-th Percentile 2630
Q1 24449
Median 31721
Q3 39493.5
95-th Percentile 55209
Maximum 540165
Range 537535
IQR 15044.5

Descriptive Statistics

Mean 30564.3896
Standard Deviation 17213.1551
Variance 2.9629e+08
Sum 1.1648e+10
Skewness 1.7661
Kurtosis 34.0041
Coefficient of Variation 0.5632
  • annual_premium is not normally distributed (p-value 2.270630572634235e-16)
  • annual_premium has 10170 outliers

vintage

numerical

Approximate Distinct Count 290
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 154.3474
Minimum 10
Maximum 299
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • vintage is skewed right (γ1 = 0.003)

Quantile Statistics

Minimum 10
5-th Percentile 24
Q1 82
Median 154
Q3 227
95-th Percentile 285
Maximum 299
Range 289
IQR 145

Descriptive Statistics

Mean 154.3474
Standard Deviation 83.6713
Variance 7000.8871
Sum 5.8823e+07
Skewness 0.00303
Kurtosis -1.2007
Coefficient of Variation 0.5421
  • vintage is not normally distributed (p-value 0.000843392300743485)

response

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 24.0 MB
  • The largest value (0) is over 7.16 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 0
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 381109
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 7.16 times larger than the second largest value (1)
  • response has words of constant length

premium_per_vintage

numerical

Approximate Distinct Count 302632
Approximate Unique (%) 79.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 363.7404
Minimum 8.796
Maximum 33717.2857
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • premium_per_vintage is skewed right (γ1 = 5.7414)

Quantile Statistics

Minimum 8.796
5-th Percentile 12.2897
Q1 117.5572
Median 195.6269
Q3 376.2297
95-th Percentile 1347.5896
Maximum 33717.2857
Range 33708.4897
IQR 258.6725

Descriptive Statistics

Mean 363.7404
Standard Deviation 548.6531
Variance 301020.2488
Sum 1.3862e+08
Skewness 5.7414
Kurtosis 107.7084
Coefficient of Variation 1.5084
  • premium_per_vintage is not normally distributed (p-value 6.587431711881458e-25)
  • premium_per_vintage has 41545 outliers

insurance_age

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 24.0 MB

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 2
2nd row 2
3rd row 4
4th row 2
5th row 5

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 381109
  • The top 2 categories (5, 2) take over 50.0%
  • insurance_age has words of constant length

premium_per_age

numerical

Approximate Distinct Count 196
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.8 MB
Mean 30564.3896
Minimum 20003
Maximum 54171.3333
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • premium_per_age is skewed left (γ1 = -0.1286)

Quantile Statistics

Minimum 20003
5-th Percentile 27125.7589
Q1 29676.1469
Median 30803.9148
Q3 31532.6507
95-th Percentile 34002.1991
Maximum 54171.3333
Range 34168.3333
IQR 1856.5037

Descriptive Statistics

Mean 30564.3896
Standard Deviation 1867.5044
Variance 3.4876e+06
Sum 1.1648e+10
Skewness -0.1286
Kurtosis 0.5343
Coefficient of Variation 0.0611
  • premium_per_age is not normally distributed (p-value 2.4541770896566714e-13)
  • premium_per_age has 18423 outliers

Interactions

Correlations

Missing Values